##Part 1: Spatial Analysis
Understanding the spatial distribution of large-scale solar footprints (i.e., rooftop, parking lot, or ground solar) and socioeconomic conditions in California is an important issue to understand for social and environmental reasons. Knowing the locations of large-scale solar facilities and how they overlap with high densities of minority populations can help identify environmental justice concerns, inform policymakers’ decisions, and ensure renewable energy efforts are equitably distributed. To visualize the spatial distribution of solar footprints in California and minority population sizes in proximity, the following research questions were addressed:
Hypothesis: the county in California with the largest area in acres of large-scale solar footprints will be in a rural area
Hypothesis: Counties with high minority populations will have larger areas in acres of solar footprints than counties with low minority populations.
##Part 2: Relationship between Energy and prices
Solar power is experiencing rapid growth across the United States, becoming increasingly integrated into the electricity system. In particular, small-scale solar photovoltaic (PV) installations can help bring down the cost of electricity for households/ communities that install them, due to its near-0 cost of cost of electricity production - which not only reduces the electricity they buy for the grid but allows them to sell the extra electricity that they generate back to the gri for a price. The capital cost of installing the solar panel/ storage is the biggest barrier to the uptake of rooftop solar, so states like California - who are leaders in this transition, with ambitious clean energy and emissions reductions targets in law - have introdued policies like net metering and subsidies to support their uptake. In this regard, in this study, we pursue the following research questions:
• What is the relationship between annual distributed solar PV capacity and generation (total values in the residential, commercial and industrial sectors) and electricity prices over between 2015-2021?
To explore this question, we use a general linear regression model (GLM) to explore if there is a significant relationship between the two variables over the entire duration of the dataset.
#Part 1: Spatial Analysis
Finding the Data:
For the spatial analysis portion of this project, the following datasets were used: (I) Solar Footprints in California GeoJSON data from the California Energy Commission; (II) 2018 Social Vulnerability Index CSV file from the Agency for Toxic Substances and Disease Registry; and (III) a USA Counties Shapefile filtered for California counties. The solar footprints dataset was found through the Environmental Data Initiative using the Google Dataset Search by typing in “solar panel locations in California”. The last two datasets were used in class but filtered for California.
Explaining the data:
The solar footprint feature class is a dataset that combines imagery to interpret a footprint of medium to large scale solar facilities throughout California and was last updates in August 2023. The feature class consists of polygons representing solar footprints and were digitized from imagery. The imagery from this dataset was obtained from Esri World Imagery, USGS National Agriculture Imagery Program (NAIP), and 2020 SENTINEL 2 Satellite Imagery, 2023. This dataset includes solar facilities with larger footprints, such as large rooftops and parking lot structures, but does not contain information on small scale solar, such as residential footprints. Specifically, it includes data on rooftop solar on large buildings, parking lot solar greater than 1 acre or clustered, and ground solar greater than 1 acre, or clustered. The features were then classified into urban and rural areas with the application of 42 U.S. Code $ 1490 rural definition. The footprint for this dataset is 129,742 acres. The Solar Footprint GeoJSON dataset is a type of format that allows us to access data directly without requiring downloads.
The CDC Social Vulnerability Index (CDC SVI) is a tool created by the Agency for Toxic Substances and Disease Registry (ATSDR) to help public health officials and emergency response planners identify and map communities that will most likely need support before, during, and after a hazardous event. The 15 social factors, such as, information about unemployment, minority status, and disability and groups them into four categories. The categories include socioeconomic status (unemployed, below the poverty line, income level, and high school diploma status), household composition and disability (aged 65 or older, aged 17 or younger, disability status, marital status), minority status and language abilities, and housing and transportation type. Each Census tract then receives a ranking for each theme and then an overall ranking, where higher values are correlated to greater vulnerability.
The USA Counties shapefile filtered for California contains attribute information on STATEFP, COUNTYFP, COUNTYNS, AFFGEOID, GEOID, NAME, LSAD, ALAND, AWATER, and a geometry. The STATEFP and COUNTYFP are state and county specific codes, respectively.
Data Wrangling Methods:
To import the Solar Footprints shapefile and USA Counties shapefile dataset into R, I used the “sf” package and “st_read” function to read in the dataset. The Social Vulnerability Index file was read in using the “utils” packages and “read.csv()” function. The USA Counties shapefile dataset contained all states and associated counties in the US, however, our research question is focused on California only. To specifically target California data, we utilized the “dplyr” package along with the “filter()” function to isolate records associated with the state code “06,” corresponding to California’s STATEFPS code. For the Solar Footprint and Social Vulnerability data, I wrangled the dataset using a pipe, the “dpylr” package and “select()” function to select for the county name, type of solar panel, urban or rural status, and the geometry and then the county, FIPS, location, total population, population in poverty, and population of minorities, respectively. The Social Vulnerability dataset also required that the FIPS code be converted to a factor.
Table 1: Data Structure Summary - Spatial Analysis
| Detail | Description |
|---|---|
| Data Source | Solar_Footprints_V2_-1101629766070969057 (1).geojson, accessed April 1, 2024 - https://cecgiscaenergy.opendata.arcgis.com/datasets/9398e39a0424434b9e95ccf8e8938807_0/explore?location=36.209468%2C-119.976002%2C8.96 |
| Retrieved from | ArcGIS Online |
| Variables Used | Solar footprint type (rooftop, parking lot, or ground), urban or rural fooprint, solar footprint area, geometry |
| Units Used | acres |
| Data Range | 5 variables 5397 observations |
| Minimum Value | 0.0004131393 |
| Maximum Value | 7438.6033 |
Table 2: Data Structure Summary - Spatial Analysis
| Detail | Description |
|---|---|
| Data Source | USA Counties Shapefile filtered for California counties - dataset provided by class |
| Retrieved from | ENVI872 Class |
| Variables Used | STATEFP, COUNTYFP, COUNTYYNS, GEOID, NAME, GEOMETRY |
| Units Used | County Code |
| Data Range | 10 variables, 3220 observations |
| Minimum Value | NA |
| Maximum Value | NA |
Table 3: Data Structure Summary - Spatial Analysis
| Detail | Description |
|---|---|
| Data Source | 2018 Social Vulnerability Index CSV file from Agency for Toxic Substances and Disease Registry, accessed April 1, 2024 - https://www.atsdr.cdc.gov/placeandhealth/svi/data_documentation_download.html |
| Retrieved from | 2018 Social Vulnerability Index CSV file from Agency for Toxic Substances and Disease Registry |
| Variables Used | COUNTY - county name, character, FIPS - county code, factor, LOCATION - county name, character, E_TOTPOP - estimated total population, integer, E_POV - estimated employed population, integer, E_MINRTY - estimated minority population, integer |
| Units Used | people per county |
| Data Range | 6 variables 58 observations |
| Minimum Value | E_TOTPOP: 1146, E_POV: 227, E_MINRTY: 468 |
| Maximum Value | E_TOTPOP: 10098052, E_POV: 1589956, E_MINRTY: 7439000 |
##Part 2: Relationship between Energy and prices
Variables of interest: We referred to the vast amount of literature available on renewable energy in California to acquire data on our variables of interest. These included:
Independent variables: Quarterly Solar PV Generation (MWh) (2015-2021) | Quarterly Solar PV Capacity (2015-2021) - both broken up by residential, commercial, industrial and total. Source: https://www.eia.gov/electricity/data/eia861m/
Dependent variables: Prices of intalling solar panels (<10 MW) on a quarterly basis(2015-2021) Source: https://www.californiadgstats.ca.gov/charts/ Note: Since there are no generation costs in solar PV, we look at CAPEX costs
Table 4 : Data Structure Summary - Energy Data
| Detail | Description |
|---|---|
| Data Source | EIA, CaliforniaGGStats |
| Retrieved from | https://www.eia.gov/electricity/data/eia861m/, https://www.californiadgstats.ca.gov/charts/ |
| Variables Used | Distributed Solar PV (< 10 kW) Prices, Generation and Capacity |
| Units Used | $/Watt, MWh, MW |
| Data Minimum Value | Price: $4.48/Watt (September 2019) |
| Data Maximum Value | Price: $5.49/ Watt (January 2015) |
Wrangling the Data The price data was only available on a quarterly basis for California, whereas the generation and capacity datasets (hereon referred to as energy datasets) were a lot bigger and more detailed. To make them comparable, we thus wrangled the energy datasets using the following steps : (1) Imported the data on generation and capacity of distributed solar energy in California into R (2) Filtered out the data for March, June, September and December to match the price data time stamps (3) mutated the date column into the right format and object type using pipes and lubridate (3) Rounding off the values and only selecting/ retaining the columns with the variable of interest using select() (4) Importing the price data and subsetting it for 2015-2021 to match the timeframe of the energy data. This data was already quarterly. (5) Mutating the date column into the right format and object type using pipes and lubridate. (6) combining the two datasetes to get one dataframe with generation and capacity of solar (>kW) in the industrial, residential and commercial sectors each as well as the corresponding electricity price for that time frame. this data is thus quarterly, spanning from 2015-2021.
#GIS COMPONENT
#Filtering the data to only include California counties
ca_counties_sf <- counties_sf[counties_sf$STATEFP == "06", ]
#Join the SVI attributes to the county spatial features
counties_sf_join <- merge(x = ca_counties_sf,
y = svi2018_CA_raw,
by.x = "GEOID",
by.y = "FIPS" )
glimpse(counties_sf_join)
## Rows: 58
## Columns: 15
## $ GEOID <chr> "06001", "06003", "06005", "06007", "06009", "06011", "06013"…
## $ STATEFP <chr> "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "…
## $ COUNTYFP <chr> "001", "003", "005", "007", "009", "011", "013", "015", "017"…
## $ COUNTYNS <chr> "01675839", "01675840", "01675841", "01675842", "01675885", "…
## $ AFFGEOID <chr> "0500000US06001", "0500000US06003", "0500000US06005", "050000…
## $ NAME <chr> "Alameda", "Alpine", "Amador", "Butte", "Calaveras", "Colusa"…
## $ LSAD <chr> "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "…
## $ ALAND <dbl> 1909598013, 1912292630, 1539933577, 4238438186, 2641829200, 2…
## $ AWATER <dbl> 216923745, 12557304, 29470567, 105311003, 43797659, 14608870,…
## $ COUNTY <chr> "Alameda", "Alpine", "Amador", "Butte", "Calaveras", "Colusa"…
## $ LOCATION <chr> "Alameda County, California", "Alpine County, California", "A…
## $ E_TOTPOP <int> 1643700, 1146, 37829, 227075, 45235, 21464, 1133247, 27424, 1…
## $ E_POV <int> 170884, 227, 3323, 44410, 5242, 2929, 102543, 5458, 16537, 23…
## $ E_MINRTY <int> 1120309, 468, 8066, 62685, 8330, 13792, 630296, 10252, 40671,…
## $ geometry <MULTIPOLYGON [°]> MULTIPOLYGON (((-122.3337 3..., MULTIPOLYGON (((…
head(counties_sf_join, 10)
## Simple feature collection with 10 features and 14 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -124.245 ymin: 35.90719 xmax: -118.3606 ymax: 41.99917
## Geodetic CRS: NAD83
## GEOID STATEFP COUNTYFP COUNTYNS AFFGEOID NAME LSAD ALAND
## 1 06001 06 001 01675839 0500000US06001 Alameda 06 1909598013
## 2 06003 06 003 01675840 0500000US06003 Alpine 06 1912292630
## 3 06005 06 005 01675841 0500000US06005 Amador 06 1539933577
## 4 06007 06 007 01675842 0500000US06007 Butte 06 4238438186
## 5 06009 06 009 01675885 0500000US06009 Calaveras 06 2641829200
## 6 06011 06 011 01675902 0500000US06011 Colusa 06 2980332864
## 7 06013 06 013 01675903 0500000US06013 Contra Costa 06 1857233047
## 8 06015 06 015 01682074 0500000US06015 Del Norte 06 2606117983
## 9 06017 06 017 00277273 0500000US06017 El Dorado 06 4423290468
## 10 06019 06 019 00277274 0500000US06019 Fresno 06 15431404077
## AWATER COUNTY LOCATION E_TOTPOP E_POV
## 1 216923745 Alameda Alameda County, California 1643700 170884
## 2 12557304 Alpine Alpine County, California 1146 227
## 3 29470567 Amador Amador County, California 37829 3323
## 4 105311003 Butte Butte County, California 227075 44410
## 5 43797659 Calaveras Calaveras County, California 45235 5242
## 6 14608870 Colusa Colusa County, California 21464 2929
## 7 225282636 Contra Costa Contra Costa County, California 1133247 102543
## 8 578742642 Del Norte Del Norte County, California 27424 5458
## 9 203328472 El Dorado El Dorado County, California 186661 16537
## 10 137345152 Fresno Fresno County, California 978130 232067
## E_MINRTY geometry
## 1 1120309 MULTIPOLYGON (((-122.3337 3...
## 2 468 MULTIPOLYGON (((-120.0725 3...
## 3 8066 MULTIPOLYGON (((-121.0275 3...
## 4 62685 MULTIPOLYGON (((-122.0449 3...
## 5 8330 MULTIPOLYGON (((-120.9955 3...
## 6 13792 MULTIPOLYGON (((-122.7851 3...
## 7 630296 MULTIPOLYGON (((-122.4253 3...
## 8 10252 MULTIPOLYGON (((-124.2196 4...
## 9 40671 MULTIPOLYGON (((-121.141 38...
## 10 686675 MULTIPOLYGON (((-120.656 36...
#ENERGY COMPONENT
#Price data is quarterly so using electricity data for March, June, September & December
CA_DSP_Energy <- CA_DSP_Energy_Raw[CA_DSP_Energy_Raw$Month %in% c(3, 6, 9, 12),]
#Wrangling CA_DSP_Energy Dataset
CA_DSP_Energy <- CA_DSP_Energy %>%
mutate(Date = paste(Year, Month, sep="-"))%>%
select("Date", c(,5:12))
#Converting Date column to date object
CA_DSP_Energy$Date <- ym(CA_DSP_Energy$Date)
#Rounding the decimal points
CA_DSP_Energy[,c(2:9)]<-round(CA_DSP_Energy[,c(2:9)])
#QUARTERLY ELECTRICTY PRICES FOR SOLAR PV >=10 KW IN CA
#Importing the data
CA_DSP_Prices_Raw <- read_excel("EnergyData/NEMPVPriceData2015-22.xlsx")
#Removing 2015 and 2023 because of incomplete data in these two years
CA_DSP_Prices <- CA_DSP_Prices_Raw[3:30,]
#Cleaning the dates
Price_Month <- rep(c(3,6,9,12),7)
CA_DSP_Prices <- cbind(CA_DSP_Prices, Price_Month)
CA_DSP_Prices$Category <- substr(CA_DSP_Prices$Category, start=1, stop=4)
CA_DSP_Prices$Date <- paste0(CA_DSP_Prices$Category,"-",CA_DSP_Prices$Price_Month)
CA_DSP_Prices$Date <- ym(CA_DSP_Prices$Date)
#Removing column with prices for capacity <10kW because not relevant to our analysis
CA_DSP_Prices <- CA_DSP_Prices %>%
select(c(5,3))
colnames(CA_DSP_Prices) <- c("Date", "Prices")
#COMBINING ELECTRICITY & PRICE DATA INTO ONE DATAFRAME
CA_DSP_Energy_Price <- cbind(CA_DSP_Energy, CA_DSP_Prices$Prices)
colnames(CA_DSP_Energy_Price)[colnames(CA_DSP_Energy_Price)=="CA_DSP_Prices$Prices"]<-"Prices"
# Change column names
colnames(CA_DSP_Energy_Price) <- c("Date", "Cap_Resid", "Cap_Commercial", "Cap_Indus", "Cap_Total", "Gen_Resid", "Gen_Commercial", "Gen_Indus", "Gen_Total", "Prices")
glimpse(CA_DSP_Energy_Price)
## Rows: 28
## Columns: 10
## $ Date <date> 2015-03-01, 2015-06-01, 2015-09-01, 2015-12-01, 2016-0…
## $ Cap_Resid <dbl> 1770, 1970, 2194, 2442, 2692, 2937, 3146, 3378, 3531, 3…
## $ Cap_Commercial <dbl> 745, 759, 799, 854, 845, 896, 962, 1074, 1113, 1233, 13…
## $ Cap_Indus <dbl> 520, 543, 562, 603, 676, 718, 762, 806, 881, 924, 963, …
## $ Cap_Total <dbl> 3034, 3272, 3555, 3900, 4213, 4552, 4871, 5258, 5524, 5…
## $ Gen_Resid <dbl> 271296, 352600, 343706, 236140, 404330, 533507, 479778,…
## $ Gen_Commercial <dbl> 121327, 143331, 134444, 89947, 136825, 170152, 157598, …
## $ Gen_Indus <dbl> 85697, 104902, 97252, 64089, 111361, 138795, 131947, 85…
## $ Gen_Total <dbl> 478320, 600833, 575401, 390176, 652516, 842454, 769323,…
## $ Prices <dbl> 4.67, 4.48, 4.33, 4.27, 4.04, 3.91, 3.89, 3.79, 3.77, 3…
head(CA_DSP_Energy_Price)
## Date Cap_Resid Cap_Commercial Cap_Indus Cap_Total Gen_Resid
## 1 2015-03-01 1770 745 520 3034 271296
## 2 2015-06-01 1970 759 543 3272 352600
## 3 2015-09-01 2194 799 562 3555 343706
## 4 2015-12-01 2442 854 603 3900 236140
## 5 2016-03-01 2692 845 676 4213 404330
## 6 2016-06-01 2937 896 718 4552 533507
## Gen_Commercial Gen_Indus Gen_Total Prices
## 1 121327 85697 478320 4.67
## 2 143331 104902 600833 4.48
## 3 134444 97252 575401 4.33
## 4 89947 64089 390176 4.27
## 5 136825 111361 652516 4.04
## 6 170152 138795 842454 3.91
##Part 1: Spatial Analysis
Spatial Analysis exploration of solar footprints across California counties. The map, “Solar Footprint Distribution across Counties in Ca” shows that the solar footprints are concentrated in cities like San Fransisco and Los Angleles, as well as more inland across counties in Bakersfield.
##Part 2: Relationship between Energy and prices The plots below represent distributed solar PV capacity, generation and prices change from 2015-2021. We see that electricity capacity increases over time, generation showcases a upward trend with a seasonal component, and prices follow a U-shaped curve wherein they are initially high, then fall from 2018-2020 and then rise again.
## $x
## [1] "Date"
##
## $y
## [1] "Prices"
##
## $title
## [1] "California Energy Prices"
##
## attr(,"class")
## [1] "labels"
Hypothesis: the county in California with the largest area in acres of large-scale solar footprints will be in a rural area
The map illustrating solar footprints across California counties shows a significant trend: smaller solar installations tend to cluster around urban and metropolitan centers, while larger solar footprints dominate in rural areas. Kern County stands out as the county with the largest area in acres (30831.1) covered by solar footprints, with ground-mounted installations being the dominant type. Notably, Kern County is classified as a rural area. This proves our hypothesis that the largest area covered by large-scale solar footprints would be found in a rural setting true.
Hypothesis: Counties with high minority populations will have larger areas in acres of solar footprints than counties with low minority populations.
Drawing specific conclusions about minority populations and solar footprint densities from the spatial map illustrating minority population sizes across counties and distribution of solar footprint is unclear. However, the map shows that the highest minority population is located in Los Angeles, allowing for further analysis on different types of solar footprints for in this county, as illustrated in in the la_solar_types map in LA County map. From the map, it is clear that ground solar footprints are found more toward Lancaster and parking and rooftop solar footprints are found closer to the coast.
##Part 2: Relationship between Energy and prices
Research question: What is the relationship between annual distributed solar PV capacity and generation and electricity prices over between 2015-2021?
Statistical testing: Null Hypothesis (H0): There is no significant relationship between solar PV generation/ capacity and energy prices Alternative Hypothesis (H1): There is a significant relationship between solar PV generation/ capacity and energy prices
We find from our GLM analysis that there is no significant relationship between solar PV prices and generation and/or capacity in either of the three segments of residential, commercial or industrial. We deduce this result from the p-value of the 12 GLMs conducted which are all greater than 0.05.
#GLM between Price vs Capacity and Generation
DSP_price_capgen_glm <- lm(Prices ~ Cap_Total + Gen_Total, data = CA_DSP_Energy_Price)
print(summary(DSP_price_capgen_glm))
##
## Call:
## lm(formula = Prices ~ Cap_Total + Gen_Total, data = CA_DSP_Energy_Price)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.30609 -0.18534 -0.04876 0.19840 0.62439
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.098e+00 1.399e-01 29.291 <2e-16 ***
## Cap_Total -7.904e-06 3.087e-05 -0.256 0.800
## Gen_Total -5.971e-08 1.894e-07 -0.315 0.755
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2514 on 25 degrees of freedom
## Multiple R-squared: 0.03515, Adjusted R-squared: -0.04203
## F-statistic: 0.4554 on 2 and 25 DF, p-value: 0.6393
DSP_price_cap_glm <- lm(Prices ~ Cap_Total, data = CA_DSP_Energy_Price)
print(summary(DSP_price_cap_glm))
##
## Call:
## lm(formula = Prices ~ Cap_Total, data = CA_DSP_Energy_Price)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.28723 -0.19911 -0.06039 0.18021 0.62581
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.092e+00 1.363e-01 30.025 <2e-16 ***
## Cap_Total -1.589e-05 1.733e-05 -0.917 0.368
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.247 on 26 degrees of freedom
## Multiple R-squared: 0.03132, Adjusted R-squared: -0.005941
## F-statistic: 0.8405 on 1 and 26 DF, p-value: 0.3677
DSP_price_gen_glm <- lm(Prices ~ Gen_Total, data = CA_DSP_Energy_Price)
print(summary(DSP_price_gen_glm))
##
## Call:
## lm(formula = Prices ~ Gen_Total, data = CA_DSP_Energy_Price)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.31667 -0.19417 -0.05023 0.19696 0.63468
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.083e+00 1.243e-01 32.842 <2e-16 ***
## Gen_Total -9.950e-08 1.063e-07 -0.936 0.358
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2468 on 26 degrees of freedom
## Multiple R-squared: 0.03262, Adjusted R-squared: -0.004583
## F-statistic: 0.8768 on 1 and 26 DF, p-value: 0.3577
#GLM Between Price vs Residential Capacity and Generation
DSP_price_residcapgen_glm <- lm(Prices ~ Cap_Resid + Gen_Resid, data = CA_DSP_Energy_Price)
print(summary(DSP_price_residcapgen_glm))
##
## Call:
## lm(formula = Prices ~ Cap_Resid + Gen_Resid, data = CA_DSP_Energy_Price)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.31006 -0.18445 -0.05033 0.20008 0.62481
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.091e+00 1.360e-01 30.084 <2e-16 ***
## Cap_Resid -8.375e-06 4.651e-05 -0.180 0.859
## Gen_Resid -1.149e-07 2.974e-07 -0.386 0.703
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2515 on 25 degrees of freedom
## Multiple R-squared: 0.03448, Adjusted R-squared: -0.04276
## F-statistic: 0.4464 on 2 and 25 DF, p-value: 0.6449
DSP_price_residcap_glm <- lm(Prices ~ Cap_Resid, data = CA_DSP_Energy_Price)
print(summary(DSP_price_residcap_glm))
##
## Call:
## lm(formula = Prices ~ Cap_Resid, data = CA_DSP_Energy_Price)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.28775 -0.20114 -0.06269 0.17835 0.62728
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.084e+00 1.323e-01 30.859 <2e-16 ***
## Cap_Resid -2.307e-05 2.631e-05 -0.877 0.389
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2473 on 26 degrees of freedom
## Multiple R-squared: 0.02872, Adjusted R-squared: -0.008638
## F-statistic: 0.7688 on 1 and 26 DF, p-value: 0.3886
DSP_price_residgen_glm <- lm(Prices ~ Gen_Resid, data = CA_DSP_Energy_Price)
print(summary(DSP_price_residgen_glm))
##
## Call:
## lm(formula = Prices ~ Gen_Resid, data = CA_DSP_Energy_Price)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.31704 -0.19509 -0.05141 0.19940 0.63200
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.081e+00 1.215e-01 33.592 <2e-16 ***
## Gen_Resid -1.587e-07 1.678e-07 -0.945 0.353
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2468 on 26 degrees of freedom
## Multiple R-squared: 0.03323, Adjusted R-squared: -0.003956
## F-statistic: 0.8936 on 1 and 26 DF, p-value: 0.3532
#GLM Between Price vs Commercial Capacity and Generation
DSP_price_commercialcapgen_glm <- lm(Prices ~ Cap_Commercial + Gen_Commercial, data = CA_DSP_Energy_Price)
print(summary(DSP_price_commercialcapgen_glm))
##
## Call:
## lm(formula = Prices ~ Cap_Commercial + Gen_Commercial, data = CA_DSP_Energy_Price)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.29369 -0.18409 -0.04952 0.18728 0.63252
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.091e+00 1.362e-01 30.034 <2e-16 ***
## Cap_Commercial -4.779e-05 1.462e-04 -0.327 0.746
## Gen_Commercial -1.443e-07 8.644e-07 -0.167 0.869
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2518 on 25 degrees of freedom
## Multiple R-squared: 0.03196, Adjusted R-squared: -0.04548
## F-statistic: 0.4127 on 2 and 25 DF, p-value: 0.6663
DSP_price_commercialcap_glm <- lm(Prices ~ Cap_Commercial, data = CA_DSP_Energy_Price)
print(summary(DSP_price_commercialcap_glm))
##
## Call:
## lm(formula = Prices ~ Cap_Commercial, data = CA_DSP_Energy_Price)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.28731 -0.19139 -0.05538 0.17637 0.63279
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.088e+00 1.329e-01 30.75 <2e-16 ***
## Cap_Commercial -6.856e-05 7.533e-05 -0.91 0.371
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2471 on 26 degrees of freedom
## Multiple R-squared: 0.03088, Adjusted R-squared: -0.006395
## F-statistic: 0.8284 on 1 and 26 DF, p-value: 0.3711
DSP_price_commercialgen_glm <- lm(Prices ~ Gen_Commercial, data = CA_DSP_Energy_Price)
print(summary(DSP_price_commercialgen_glm))
##
## Call:
## lm(formula = Prices ~ Gen_Commercial, data = CA_DSP_Energy_Price)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.3094 -0.1889 -0.0482 0.1880 0.6440
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.073e+00 1.225e-01 33.252 <2e-16 ***
## Gen_Commercial -3.848e-07 4.461e-07 -0.863 0.396
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2474 on 26 degrees of freedom
## Multiple R-squared: 0.02782, Adjusted R-squared: -0.00957
## F-statistic: 0.7441 on 1 and 26 DF, p-value: 0.3962
#GLM Between Price vs Industrial Capacity and Generation
DSP_price_industrialcapgen_glm <- lm(Prices ~ Cap_Indus + Gen_Indus, data = CA_DSP_Energy_Price)
print(summary(DSP_price_industrialcapgen_glm))
##
## Call:
## lm(formula = Prices ~ Cap_Indus + Gen_Indus, data = CA_DSP_Energy_Price)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.30389 -0.19595 -0.05164 0.20732 0.60561
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.157e+00 1.673e-01 24.849 <2e-16 ***
## Cap_Indus -1.390e-04 2.459e-04 -0.565 0.577
## Gen_Indus -2.411e-07 1.290e-06 -0.187 0.853
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2495 on 25 degrees of freedom
## Multiple R-squared: 0.0494, Adjusted R-squared: -0.02664
## F-statistic: 0.6496 on 2 and 25 DF, p-value: 0.5308
DSP_price_industrialcap_glm <- lm(Prices ~ Cap_Indus, data = CA_DSP_Energy_Price)
print(summary(DSP_price_industrialcap_glm))
##
## Call:
## lm(formula = Prices ~ Cap_Indus, data = CA_DSP_Energy_Price)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.2917 -0.2031 -0.0576 0.1963 0.6059
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.1548813 0.1636575 25.388 <2e-16 ***
## Cap_Indus -0.0001747 0.0001524 -1.146 0.262
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2449 on 26 degrees of freedom
## Multiple R-squared: 0.04807, Adjusted R-squared: 0.01146
## F-statistic: 1.313 on 1 and 26 DF, p-value: 0.2623
DSP_price_industrialgen_glm <- lm(Prices ~ Gen_Indus, data = CA_DSP_Energy_Price)
print(summary(DSP_price_industrialgen_glm))
##
## Call:
## lm(formula = Prices ~ Gen_Indus, data = CA_DSP_Energy_Price)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.32704 -0.19794 -0.05025 0.19860 0.63310
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.106e+00 1.387e-01 29.609 <2e-16 ***
## Gen_Indus -8.065e-07 8.042e-07 -1.003 0.325
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2462 on 26 degrees of freedom
## Multiple R-squared: 0.03724, Adjusted R-squared: 0.0002152
## F-statistic: 1.006 on 1 and 26 DF, p-value: 0.3252
This result is supported by the plot of price vs capacity and generation, which are highly scattered with no visible trend.
#Capacity
# Plotting Solar Capacity vs. Electricity Prices
plot(CA_DSP_Energy_Price$Cap_Total, CA_DSP_Energy_Price$Prices,
xlab = "Solar PV Capacity", ylab = "Electricity Prices")
title("Figure 1. Solar PV Capacity vs Electricity Prices 2015-2021")
# Fitting linear regression model
DSP_price_cap_glm <- lm(Prices ~ Cap_Total, data = CA_DSP_Energy_Price)
# Adding a regression line
abline(DSP_price_cap_glm, col = "red")
#Generation
# Plotting Solar PV Generation vs. Electricity Prices
plot(CA_DSP_Energy_Price$Gen_Total, CA_DSP_Energy_Price$Prices,
xlab = "Solar PV Generation", ylab = "Electricity Prices")
title("Figure 2. Solar PV Generation vs Electricity Prices 2015-2021")
# Fitting linear regression model
DSP_price_gen_glm <- lm(Prices ~ Gen_Total, data = CA_DSP_Energy_Price)
# Adding a regression line
abline(DSP_price_gen_glm, col = "red")
Understanding the spatial distribution of large-scale solar footprints in California counties is critical for addressing both social and environmental issues. By examining the distribution, we can identify areas of overlap with minority populations and address climate justice issues to ensure an equitable distribution of renewable energy for all. This study addresses two spatial analysis questions: (1)the relationship between solar footprint size and location in rural or urban counties, and (2) the spatial distribution of industrial solar footprints and minority population size.
This analysis revealed trends in the spatial distribution of solar footprints across California counties. Smaller solar installations were found to cluster more around urban areas, while larger solar footprints were found in rural areas. From the analysis, Kern county was found as the largest area covered by solar footprints, which supported the hypothesis that the county in California with the largest area in acres of large-scale solar footprints will be in a rural area. When analyzing minority population densities, Los Angeles was identified as having the largest population size across counties. The results showed that ground solar footprints were found clustered inland near Lancaster, while parking lot and rooftop installations were found concentrated along the coast. However, drawing specific conclusions about minority populations and solar footprint densities from the spatial map illustrating minority population sizes across counties and distribution of solar footprint was unclear. Overall, this analysis provides valuable insights into energy infrastructure and socioeconomic factors in California.
##Part 2: Relationship between Energy and prices
The GLM results and data plots clearly show that there is no significant relationship between the cost of solar panels and their installed capacity and the electricity they generate. This highlights the fact that there are other external drivers besides price that drive the uptake of solar energy - probably including policies, infrastructure, public perception and other factors.